Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
The development and training of deep learning models have become increasingly costly and complex. Consequently, software engineers are adopting pre-trained models (PTMs) for their downstream applications. The dynamics of the PTM supply chain remain largely unexplored, signaling a clear need for structured datasets that document not only the metadata but also the subsequent applications of these models. Without such data, the MSR community cannot comprehensively understand the impact of PTM adoption and reuse.This paper presents the PeaTMOSS dataset, which comprises metadata for 281,638 PTMs and detailed snapshots for all PTMs with over 50 monthly downloads (14,296 PTMs), along with 28,575 open-source software repositories from GitHub that utilize these models. Additionally, the dataset includes 44,337 mappings from 15,129 downstream GitHub repositories to the 2,530 PTMs they use. To enhance the dataset’s comprehensiveness, we developed prompts for a large language model to automatically extract model metadata, including the model’s training datasets, parameters, and evaluation metrics. Our analysis of this dataset provides the first summary statistics for the PTM supply chain, showing the trend of PTM development and common shortcomings of PTM package documentation. Our example application reveals inconsistencies in software licenses across PTMs and their dependent projects. PeaTMOSS lays the foundation for future research, offering rich opportunities to investigate the PTM supply chain. We outline mining opportunities on PTMs, their downstream usage, and cross-cutting questions.Our artifact is available at https://github.com/PurdueDualityLab/PeaTMOSS-Artifact. Our dataset is available at https://transfer.rcac.purdue.edu/file-manager?origin_id=ff978999-16c2-4b50-ac7a-947ffdc3eb1d&origin_path=%2F.more » « less
-
The development and training of deep learning models have become increasingly costly and complex. Consequently, software engineers are adopting pre-trained models (PTMs) for their downstream applications. The dynamics of the PTM supply chain remain largely unexplored, signaling a clear need for structured datasets that document not only the metadata but also the subsequent applications of these models. Without such data, the MSR community cannot comprehensively understand the impact of PTM adoption and reuse. This paper presents the PeaTMOSS dataset, which comprises metadata for 281,638 PTMs and detailed snapshots for all PTMs with over 50 monthly downloads (14,296 PTMs), along with 28,575 open-source software repositories from GitHub that utilize these models. Additionally, the dataset includes 44,337 mappings from 15,129 downstream GitHub repositories to the 2,530 PTMs they use. To enhance the dataset’s comprehensiveness, we developed prompts for a large language model to automatically extract model metadata, including the model’s training datasets, parameters, and evaluation metrics. Our analysis of this dataset provides the first summary statistics for the PTM supply chain, showing the trend of PTM development and common shortcomings of PTM package documentation. Our example application reveals inconsistencies in software licenses across PTMs and their dependent projects. PeaTMOSS lays the foundation for future research, offering rich opportunities to investigate the PTM supply chain. We outline mining opportunities on PTMs, their downstream usage, and cross-cutting questions. Our artifact is available at https://github.com/PurdueDualityLab/PeaTMOSS-Artifact. Our dataset is available at https://transfer.rcac.purdue.edu/file-manager?origin_id=ff978999-16c2-4b50-ac7a-947ffdc3eb1d&origin_path=%2F.more » « less
-
Abstract Visualizing atomic-orbital degrees of freedom is a frontier challenge in scanned microscopy. Some types of orbital order are virtually imperceptible to normal scattering techniques because they do not reduce the overall crystal lattice symmetry. A good example is d xz / d yz (π,π) orbital order in tetragonal lattices. For enhanced detectability, here we consider the quasiparticle scattering interference (QPI) signature of such (π,π) orbital order in both normal and superconducting phases. The theory reveals that sublattice-specific QPI signatures generated by the orbital order should emerge strongly in the superconducting phase. Sublattice-resolved QPI visualization in superconducting CeCoIn 5 then reveals two orthogonal QPI patterns at lattice-substitutional impurity atoms. We analyze the energy dependence of these two orthogonal QPI patterns and find the intensity peaked near E = 0, as predicted when such (π,π) orbital order is intertwined with d -wave superconductivity. Sublattice-resolved superconductive QPI techniques thus represent a new approach for study of hidden orbital order.more » « less
-
Regular expressions are used for diverse purposes, including input validation and firewalls. Unfortunately, they can also lead to a security vulnerability called ReDoS (Regular Expression Denial of Service), caused by a super-linear worst-case execution time during regex matching. Due to the severity and prevalence of ReDoS, past work proposed automatic tools to detect and fix regexes. Although these tools were evaluated in automatic experiments, their usability has not yet been studied; usability has not been a focus of prior work. Our insight is that the usability of existing tools to detect and fix regexes will improve if we complement them with anti-patterns and fix strategies of vulnerable regexes. We developed novel anti-patterns for vulnerable regexes, and a collection of fix strategies to fix them. We derived our anti-patterns and fix strategies from a novel theory of regex infinite ambiguity — a necessary condition for regexes vulnerable to ReDoS. We proved the soundness and completeness of our theory. We evaluated the effectiveness of our anti-patterns, both in an automatic experiment and when applied manually. Then, we evaluated how much our anti-patterns and fix strategies improve developers’ understanding of the outcome of detection and fixing tools. Our evaluation found that our anti-patterns were effective over a large dataset of regexes (N=209,188): 100% precision and 99% recall, improving the state of the art 50% precision and 87% recall. Our anti-patterns were also more effective than the state of the art when applied manually (N=20): 100% developers applied them effectively vs. 50% for the state of the art. Finally, our anti-patterns and fix strategies increased developers’ understanding using automatic tools (N=9): from median “Very weakly” to median “Strongly” when detecting vulnerabilities, and from median “Very weakly” to median “Very strongly” when fixing them.more » « less
-
Abstract An unidentified quantum fluid designated the pseudogap (PG) phase is produced by electron-density depletion in the CuO 2 antiferromagnetic insulator. Current theories suggest that the PG phase may be a pair density wave (PDW) state characterized by a spatially modulating density of electron pairs. Such a state should exhibit a periodically modulating energy gap $${\Delta }_{{{{{{\rm{P}}}}}}}({{{{{\boldsymbol{r}}}}}})$$ Δ P ( r ) in real-space, and a characteristic quasiparticle scattering interference (QPI) signature $${\Lambda }_{{{{{{\rm{P}}}}}}}({{{{{\boldsymbol{q}}}}}})$$ Λ P ( q ) in wavevector space. By studying strongly underdoped Bi 2 Sr 2 CaDyCu 2 O 8 at hole-density ~0.08 in the superconductive phase, we detect the 8 a 0 -periodic $${\Delta }_{{{{{{\rm{P}}}}}}}({{{{{\boldsymbol{r}}}}}})$$ Δ P ( r ) modulations signifying a PDW coexisting with superconductivity. Then, by visualizing the temperature dependence of this electronic structure from the superconducting into the pseudogap phase, we find the evolution of the scattering interference signature $$\Lambda ({{{{{\boldsymbol{q}}}}}})$$ Λ ( q ) that is predicted specifically for the temperature dependence of an 8 a 0 -periodic PDW. These observations are consistent with theory for the transition from a PDW state coexisting with d -wave superconductivity to a pure PDW state in the Bi 2 Sr 2 CaDyCu 2 O 8 pseudogap phase.more » « less
-
Abstract Complete theoretical understanding of the most complex superconductors requires a detailed knowledge of the symmetry of the superconducting energy-gap$${\mathrm{{\Delta}}}_{\mathbf{k}}^\alpha$$ , for all momentakon the Fermi surface of every bandα. While there are a variety of techniques for determining$$|{\mathrm{{\Delta}}}_{\mathbf{k}}^\alpha |$$ , no general method existed to measure the signed values of$${\mathrm{{\Delta}}}_{\mathbf{k}}^\alpha$$ . Recently, however, a technique based on phase-resolved visualization of superconducting quasiparticle interference (QPI) patterns, centered on a single non-magnetic impurity atom, was introduced. In principle, energy-resolved and phase-resolved Fourier analysis of these images identifies wavevectors connecting allk-space regions where$${\mathrm{{\Delta}}}_{\mathbf{k}}^\alpha$$ has the same or opposite sign. But use of a single isolated impurity atom, from whose precise location the spatial phase of the scattering interference pattern must be measured, is technically difficult. Here we introduce a generalization of this approach for use with multiple impurity atoms, and demonstrate its validity by comparing the$${\mathrm{{\Delta}}}_{\mathbf{k}}^\alpha$$ it generates to the$${\mathrm{{\Delta}}}_{\mathbf{k}}^\alpha$$ determined from single-atom scattering in FeSe where s±energy-gap symmetry is established. Finally, to exemplify utility, we use the multi-atom technique on LiFeAs and find scattering interference between the hole-like and electron-like pockets as predicted for$${\mathrm{{\Delta}}}_{\mathbf{k}}^\alpha$$ of opposite sign.more » « less
-
null (Ed.)The defining characteristic of hole-doped cuprates is d -wave high temperature superconductivity. However, intense theoretical interest is now focused on whether a pair density wave state (PDW) could coexist with cuprate superconductivity [D. F. Agterberg et al., Annu. Rev. Condens. Matter Phys. 11, 231 (2020)]. Here, we use a strong-coupling mean-field theory of cuprates, to model the atomic-scale electronic structure of an eight-unit-cell periodic, d -symmetry form factor, pair density wave (PDW) state coexisting with d -wave superconductivity (DSC). From this PDW + DSC model, the atomically resolved density of Bogoliubov quasiparticle states N r , E is predicted at the terminal BiO surface of Bi 2 Sr 2 CaCu 2 O 8 and compared with high-precision electronic visualization experiments using spectroscopic imaging scanning tunneling microscopy (STM). The PDW + DSC model predictions include the intraunit-cell structure and periodic modulations of N r , E , the modulations of the coherence peak energy Δ p r , and the characteristics of Bogoliubov quasiparticle interference in scattering-wavevector space q - space . Consistency between all these predictions and the corresponding experiments indicates that lightly hole-doped Bi 2 Sr 2 CaCu 2 O 8 does contain a PDW + DSC state. Moreover, in the model the PDW + DSC state becomes unstable to a pure DSC state at a critical hole density p *, with empirically equivalent phenomena occurring in the experiments. All these results are consistent with a picture in which the cuprate translational symmetry-breaking state is a PDW, the observed charge modulations are its consequence, the antinodal pseudogap is that of the PDW state, and the cuprate critical point at p * ≈ 19% occurs due to disappearance of this PDW.more » « less
An official website of the United States government

Full Text Available